Search CORE

78 research outputs found

Living Up to Expectations: Computing Expert Responses

Author: Joshi Aravind K
Webber Bonnie L
Weischedel Ralph
Publication venue: ScholarlyCommons
Publication date: 01/01/2007
Field of study

In cooperative man-machine interaction, it is necessary but not sufficient for a system to respond truthfully and informatively to a user\u27s question. In particular, if the system has reason to believe that its planned response might mislead the user, then it must block that conclusion by modifying its response. This paper focuses on identifying and avoiding potentially misleading responses by acknowledging types of \u27informing behavior\u27 usually expected of an expert. We attempt to give a formal account of several types of assertions that should be included in response to questions concerning the achievement of some goal (in addition to the simple answer), lest the questioner otherwise be misled

ScholarlyCommons@Penn

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Author: Galstyan Aram
Ghazarian Sarik
Peng Nanyun
Weischedel Ralph
Publication venue
Publication date: 24/01/2020
Field of study

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Remember what you did so you know what to do next

Author: Ciosici Manuel R.
Freedman Marjorie
Hedges Alex
Kankanampati Yash
Martin Justin
Weischedel Ralph
Publication venue
Publication date: 30/10/2023
Field of study

We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J.Comment: Identical to EMNLP 2023 Finding

arXiv.org e-Print Archive

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

Author: Devlin Jacob
Fries Jason
Giannakopoulos Athanasios
Jiang Haoming
Kingma Diederik P
Pennington Jeffrey
Weischedel Ralph
Yang Zhilin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/06/2020
Field of study

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20

arXiv.org e-Print Archive

Crossref